8 research outputs found

    Discovery of sensitive data with natural language processing

    Get PDF
    The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed a component that can extract and classify sensitive data, from unstructured text information in European Portuguese. The objective was to create a system that allows organizations to understand their data and comply with legal and security purposes. We studied a hybrid approach to the problem of Named Entities Recognition for the Portuguese language. This approach combines several techniques such as rule-based/lexical-based models, machine learning algorithms and neural networks. The rule-based and lexical-based approaches were used only for a set of specific classes. For the remaining classes of entities, SpaCy and Stanford NLP tools were tested, two statistical models – Conditional Random Fields and Random Forest – were implemented and, finally, a Bidirectional- LSTM approach as experimented. The best results were achieved with the Stanford NER model (86.41%), from the Stanford NLP tool. Regarding the statistical models, we realized that Conditional Random Fields is the one that can obtain the best results, with a f1-score of 65.50%. With the Bi-LSTM approach, we have achieved a result of 83.01%. The corpora used for training and testing were HAREM Golden Collection, SIGARRA News Corpus and DataSense NER Corpus.O processo de preservação de dados sensíveis está em constante crescimento e cada vez apresenta maior importância, proveniente especialmente das diretivas e leis impostas pela União Europeia. O esforço para criar sistemas automáticos é contínuo, mas o processo é realizado na maioria dos casos de forma manual ou semiautomática. Neste trabalho desenvolvemos um componente de Extração e Classificação de dados sensíveis, que processa textos não-estruturados em Português Europeu. O objetivo consistiu em criar um sistema que permite às organizações compreender os seus dados e cumprir com fins legais de conformidade e segurança. Para resolver este problema, foi estudada uma abordagem híbrida de Reconhecimento de Entidades Mencionadas para a língua Portuguesa. Esta abordagem combina técnicas baseadas em regras e léxicos, algoritmos de aprendizagem automática e redes neuronais. As primeiras abordagens baseadas em regras e léxicos, foram utilizadas apenas para um conjunto de classes especificas. Para as restantes classes de entidades foram utilizadas as ferramentas SpaCy e Stanford NLP, testados dois modelos estatísticos — Conditional Random Fields e Random Forest – e por fim testada uma abordagem baseada em redes neuronais – Bidirectional-LSTM. Ao nível das ferramentas utilizadas os melhores resultados foram conseguidos com o modelo Stanford NER (86,41%). Através dos modelos estatísticos percebemos que o Conditional Random Fields é o que consegue obter melhores resultados, com um f1-score de 65,50%. Com a última abordagem, uma rede neuronal Bi-LSTM, conseguimos resultado de f1-score de aproximadamente 83,01%. Para o treino e teste das diferentes abordagens foram utilizados os conjuntos de dados HAREM Golden Collection, SIGARRA News Corpus e DataSense NER Corpus

    MAMMALS IN PORTUGAL : A data set of terrestrial, volant, and marine mammal occurrences in P ortugal

    Get PDF
    Mammals are threatened worldwide, with 26% of all species being includedin the IUCN threatened categories. This overall pattern is primarily associatedwith habitat loss or degradation, and human persecution for terrestrial mam-mals, and pollution, open net fishing, climate change, and prey depletion formarine mammals. Mammals play a key role in maintaining ecosystems func-tionality and resilience, and therefore information on their distribution is cru-cial to delineate and support conservation actions. MAMMALS INPORTUGAL is a publicly available data set compiling unpublishedgeoreferenced occurrence records of 92 terrestrial, volant, and marine mam-mals in mainland Portugal and archipelagos of the Azores and Madeira thatincludes 105,026 data entries between 1873 and 2021 (72% of the data occur-ring in 2000 and 2021). The methods used to collect the data were: live obser-vations/captures (43%), sign surveys (35%), camera trapping (16%),bioacoustics surveys (4%) and radiotracking, and inquiries that represent lessthan 1% of the records. The data set includes 13 types of records: (1) burrowsjsoil moundsjtunnel, (2) capture, (3) colony, (4) dead animaljhairjskullsjjaws, (5) genetic confirmation, (6) inquiries, (7) observation of live animal (8),observation in shelters, (9) photo trappingjvideo, (10) predators dietjpelletsjpine cones/nuts, (11) scatjtrackjditch, (12) telemetry and (13) vocalizationjecholocation. The spatial uncertainty of most records ranges between 0 and100 m (76%). Rodentia (n=31,573) has the highest number of records followedby Chiroptera (n=18,857), Carnivora (n=18,594), Lagomorpha (n=17,496),Cetartiodactyla (n=11,568) and Eulipotyphla (n=7008). The data setincludes records of species classified by the IUCN as threatened(e.g.,Oryctolagus cuniculus[n=12,159],Monachus monachus[n=1,512],andLynx pardinus[n=197]). We believe that this data set may stimulate thepublication of other European countries data sets that would certainly contrib-ute to ecology and conservation-related research, and therefore assisting onthe development of more accurate and tailored conservation managementstrategies for each species. There are no copyright restrictions; please cite thisdata paper when the data are used in publications.info:eu-repo/semantics/publishedVersio

    Mammals in Portugal: a data set of terrestrial, volant, and marine mammal occurrences in Portugal

    Get PDF
    Mammals are threatened worldwide, with ~26% of all species being included in the IUCN threatened categories. This overall pattern is primarily associated with habitat loss or degradation, and human persecution for terrestrial mammals, and pollution, open net fishing, climate change, and prey depletion for marine mammals. Mammals play a key role in maintaining ecosystems functionality and resilience, and therefore information on their distribution is crucial to delineate and support conservation actions. MAMMALS IN PORTUGAL is a publicly available data set compiling unpublished georeferenced occurrence records of 92 terrestrial, volant, and marine mammals in mainland Portugal and archipelagos of the Azores and Madeira that includes 105,026 data entries between 1873 and 2021 (72% of the data occurring in 2000 and 2021). The methods used to collect the data were: live observations/captures (43%), sign surveys (35%), camera trapping (16%), bioacoustics surveys (4%) and radiotracking, and inquiries that represent less than 1% of the records. The data set includes 13 types of records: (1) burrows | soil mounds | tunnel, (2) capture, (3) colony, (4) dead animal | hair | skulls | jaws, (5) genetic confirmation, (6) inquiries, (7) observation of live animal (8), observation in shelters, (9) photo trapping | video, (10) predators diet | pellets | pine cones/nuts, (11) scat | track | ditch, (12) telemetry and (13) vocalization | echolocation. The spatial uncertainty of most records ranges between 0 and 100 m (76%). Rodentia (n =31,573) has the highest number of records followed by Chiroptera (n = 18,857), Carnivora (n = 18,594), Lagomorpha (n = 17,496), Cetartiodactyla (n = 11,568) and Eulipotyphla (n = 7008). The data set includes records of species classified by the IUCN as threatened (e.g., Oryctolagus cuniculus [n = 12,159], Monachus monachus [n = 1,512], and Lynx pardinus [n = 197]). We believe that this data set may stimulate the publication of other European countries data sets that would certainly contribute to ecology and conservation-related research, and therefore assisting on the development of more accurate and tailored conservation management strategies for each species. There are no copyright restrictions; please cite this data paper when the data are used in publications

    Innovation in Teams: The Role of Psychological Capital and Team Learning

    No full text
    The main purpose of the present research was to analyze the relationship between team psychological capital and innovation, considering team learning as a mediating variable. A field survey was carried out, which included 124 work teams belonging to organizations from different sectors of activity. Hypotheses were tested through PROCESS. Results supported a direct positive relationship between team psychological capital and team innovation and an indirect influence of team psychological capital on team innovation, through team learning. The findings of this study highlight the role of team learning as an intervening process between team psychological capital and team innovation. Accordingly, managers should seek to develop team psychological capital and learning behaviors among their teams to promote innovation

    C. Literaturwissenschaft.

    No full text
    corecore